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seven V L {four Vk and three V?.} s;erroHne families cov 
ui the human antibody diversity used. A consensus sequence was 
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accessible for diversification, the synthetic genes were dt° 
modular and mutually coi [ roducing uniq 

endonuciease sites flanking the CDRs. "Molecular modeling 
all canonical classes were present We could show that alf 
are expressed as soluble proteins in the periplasm of E. coli. 
antibody phage display libraries totalling 2 x 10* members was create* 
after cloning the genes m all 49 combinations into a phagemid vector 
tself de\ idol tricfion tes in question Dsverstb, v, s reated b 

replacing the V H and V L CDR3 regions of the master" «enes by CDR: 
library cassettes, generated from mixed trinucleotides and\>iased toward' 
natural human antibody CDR3 sequences. The sequencing of 25? m « m 
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f ili<!H )u< he modulai lesign of ail master genes, either single bin- 
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knowledge of the particular sequence, ism/ ore-built COR cassette 
libraries. The small number of 49 master genes will allow future 
improvements to be incorporated quickly, and the separation of the ft a 
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The selection of antibody fragments front 
iibrar i s -g ( nti Jim it al | 

pbagv lis) n {Si el Sr Scott ! 1 nt n. J , 

play {Hanes & Pltickthun, 1997), bacterial display 

^ >< < ( - 1 las i m ke , 

1997) has proven to be a successful 'alternative to 
classical hybridoma tedmology dor recent reviews. 
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ii play \ a> leve oped first 1* is 

been improved the furthest, especially in the anti- 
body field, ft is likely that conventional hybridoma 
technology mav be superceded by a combination 
ot thi => > s « tn h f idus m 

faster, involve no' animals, yield antibodies of at 
least comparable affinities and work also with self- 
antigens or toxic molecules (Hoogenboom el .?/.., 
1998). The selection of antibodies must start from 
an initial, lib sry fere, we describe 

the construction of such a library by total gene syn- 
thesis, based on a structural analysis of the human 
antibody repertoire. 

Human antibodies are oi particular interest, 
since they are considered to be valuable for thera- 
peutic applications (Carter & Merchant, 1997), 
avoiding the MAMA (human anti-mouse antibody) 
response frequently observed with rodent 
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bodies through protein engineering can success- 
fully retain the affinity and specificity of the 
parental molecule (Baca et al. t 1997), this strategy is 
time-consuming and still does no) yield fully 
human antibodies. 

Previous, ph i iispl libraries of human anti- 
bodies have been generated from immunized 
donors (Barbas & Burton, 19961, germline 
aequo mo, (Gj i iths ■< a! 1994) oi mos rer 
o.aive Smell Ig repertoires (Vaughan d ah, 1996; 
Sheets ef »/., 19*8; Do Heard ei 1999). Selection 
from these libraries by phage-display bos yielded 
. . roi haptens, pop- 
sides and proteins. While these libraries have all 
been successful, their uncontrollable composition 
and problems with the subsequent expression of 
the antibo lies (set i , 

> possil ide st di > 1 

protein engineering approach to solve the problem. 

high-affinity antibodies 
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library size fPerelsr 
relation may not tie tractable by theoretical con- 
siderations, as it may be antigen-dependent. Con- 
sequently, successful "one-pot" libraries have ail 
been large (Griff- 1 ? 19 V'aughan 

'!996; Sheets ' ad, 1998; De . f . i i 1 1 , } , , p s 
important to note that, obviously, only the func- 
tional library size, to. the ntimo- f . o . ■ 
assembled clones without any frameshift, stop 
eodon ot deletion will loutnhuie io the due it) 
This number can be orders of magnitude below 
the apparent diversity usual! j r« orted,, which is 
normally obtained by counting the numbers of 
transform ants. 

It has been shown that the Escherichia coli 
expression yields of functional antibody fragments 
can vary dramatically, even if the antibody gene is 
expressed in the same format, vector and 
expression strain. This effect has been shown to 
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(V-C) Interface an; responsible 
aggregation or even toxic effects on the E, coli cells, 
hence leading to poor expression yields. Mutating 
those residues improved exj.r> j 

Id \ thout id > I j) tti m h t 

erties (Den et t 19 9 f k k i u 

1995; Ulrich et <?/., 1995; harm ,9 Pluekthnn 1W- 
N i c < t Foi t tii phagt 

display depends on correctly folded antibodies, 
here i ! lection gainst p >Or folders if en 
c! «/., 1994; Jackson ef r.L, 1995; Jung & iduckrhiim 
19 ? both i inn & Poa Ku and bus th 

functional library sum wail be decreased. However, 
the selection is clearly not stringent enotmb io 
e ut ft it Urn i (esse! ed from hage d 
play library will have acceptable folding proper- 
ties. Thus, to maintain diversity and secure 
reasonable expression properties of the selected 
molecules, if would be advantageous to create anti- 
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P| ifbat of o her naive libraries 

'The humoral immune system, however, does not 
work by the "single-pot" approach (Nissim et «/.,, 
19U4), but rather uses an evolutionary strategy . 
The initial antigen-independent variability is first 
. , sted turin , 8 cell d < * fopment U ren 
rearrangements (V(D)I-joinir.g}, leading to more 
than 1(9 different moleci es at anv one time in a 
human being (Winter, 1998). After a 8-ceil is acti- 
vated, the antigen-driven process ot somatic 
mutation is initiated (Rajewsky,. 1996), and remark- 
able improvements in binding can be found, it has 
been shown that mutations occurring in CDRs I 
and 2 am preferentially selected "(Waener & 
M< oli ;< '.<" f n e > ,d t > , i > i , 
1998), as their diversity in the initial germline var- 
iants is much more limited than that of the CDRSs 
(IbmlinsOn at a/,, 1096). The design of an artificial 
library should make it convenient to follow this 
same approach. Indeed, previous experiments with 
peptides (Cwirla ef nl, 1997), KNA aptamers (He 
et ah, 19%) and antibodies trashier et ,?/,, 1996a; 
Hemes et el., 1998} have shown that the evolution- 
ary approach and, m the ease of antibodies, CDR 
walking {Yang et rd, 1995; Schier el at., 1996a; VVu 
et al, 1998) can dramatically improve affinities. 
However, in the absence of suitably engineered 
genes, such an optimization can be extremely 
laborious. 

The human antibody germline repertoire has 
recently been completely sequenced. There are 
-• h ^ 9 tis. benahv ^ i hue genes located on 
chromosome 14 (Tomlmscei el a}, 1992. Matsuda 



& Honjo, 19%), which can be grouped into six sub- 
families according to sequence homology. About 
40 functional V t kappa genes comprising seven 
Subfamilies arc located on chromosome 2 (Cox 
; ' - 1 < & Lefnw 1998) and about 30 
functional V,. lambda genes grouped into ten sub- 
famil cum ? 22 (Williams 

ef «n, 1996; Kawasaki et al, 1997; Pal la res ef «<., 
1998). 'Die groups vary in size from one member 
(e.g. V H 6 and Vk4) to op to 22 members (V H 3), and 
' roiq r< i degree of 

sequence homology. By comparing" rearranged 
sequences of human antibodies wtih Iheic ccrmime 
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Ignatovioh . 
human germ line gent;;; are never or 
rarely used during an immune response. 

In structural terms, the V„ and V,, domains com- 
prising the antigen binding IV moiety (see Figure 1) 
;hart scorn > it in its cei r >< rtion i ; 
aim< (peril [ nip able, even when baa 

menls from different species are compared 
(Chothia et al, 1998). Larger differences are 
observed only in the conformation of the CDRs., 
and it has been shown in. a series of studies 
(Chothia & U-sfc, 19 Chothia < <;.'. 1989; Ab 
Uaikani et al, 3997; that atf CDKs except V M 
CDR3 adopt only a few distinct conformations! 
Hence the repertoire of conformations is limited to 
a relatively small number of discrete structural 
• lasses depending on both: the C OR : nd I 
so-called canonical amino acid residues (Ch« tina i 
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design, 



designated HuCAL (Human Combinatorial Anti- 
body Libraries). Each of the human V H and V L 
subfamilies that is frequently used during an 
immune response is represented by one consensus 
framework, resulting in seven HuCAL master 
genes for heavy chains and seven for light chains., 
and rhia-: 49 combinations. All genes wen- made by 
tola; synthesis, thereby taking into consideration 
cod on usage, unfavorable residues that promote 
eotem >g ;regalion as well a^ unique md general 
restriction s t thihin all i >) lending to mod- 
ular genes that contain readily accessible CDRs 
and can be easily converted into different antibody 
formats. 

A first set of antibody libraries based on the 
HuCAL concept was created by randomizing both 
the V H and V h CP ' oclit i of the 49 

using triroideotid tor 
i , 3994 which lea Is to h gh 

quality lihr ' 
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'ersion 1) were extensively characterized 
ring, expression behavior and numerous 
experiments against a wide variety of 



Analysis of the human antibody repertoire 



beqi , : analysis 

Ammo acid sequences from variable domains of 
human immunoglobulins were collected from 
Kabat (Kaba d tsl 1991 Job on a* ... 1996:?) and 
Genbank (Benson el <n , N97 mi m.mpumkd 
into three databases, V heavy chain (V M ), V kappa 
(Vk) and V lambda {VI}, and aligned, using tire 
umbering system. For cacti of the three 
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equer 



whenever mon j itions had been deter- 

mined, giving 3a6, 149 and 675 entries for Vk, V7, 

e d v' I( r pt n el, u hi hr i nt hb a- r i u> 
n rl irl germhtu qrtenees wen >!] ete 
(4S, 26 and 43 entries for Vk.. VK and V H , respect- 
ively), as the complete ioeij' (see Cook & 

(bmitnson, 1995}, had not been published at that 
time. Finally, all known D arid 1 sequences were 
collected. Although the design was started before 
the complete gerinlirse repertoire was known, the 
availability of the whole repertoire and a larger 
number of rearranged sequences would not have 
influenced the library design, which was demon- 

' n a tk m iysi u n to > ort 

pi 1 mlu -i rtoire and a larger database 
(846, 413 and 1201 entries for Vk, VX and V H , 
respectively) of human rearranged sequences (see 
Figure 2). 

the binning into families is somewhat arbitrary, 
depending on how the homology cutoff between 

n is d d tia for VV, sevej families 
were established, VI was divided into eight 
families and V H into sax families. The single V H 
gerrnline gene of the V' K 7 family (van Dijk tt al, 
1993; was included in the V M j family., since the 
genes of the two families are highly homologous. 
Upon more detailed , M g canonical 

CDR eonformaitoi am canonical ram'ev it 
dues as well as gene usage (see below), the number 
of families was" raised to seven for V' M , but was 
a; to four for Vk and three for VK, 
To further examine the concept of constructing 
HuCAL using the e< I throning ot 

sequence space as an efficient means to engineer 
library diversity, it was important So test the usage 
of ib> t mtnrd gt >u j m lain! t^srrtr.ea t i;to 
of antibodies. By counting the number of differ- 
ences betwee 

genrtiint M.q H m > tht non i , n , 
part was identified tor each rea a f 
Ui geih " _ i 1 ^ > i ! », j 

Vj, sequences (3-io Vc and 131 VK) could be clearly 
assigned to gerrnline counterparts. 
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uv i Rli i ' „ h , , , i<C, I l)Y u, S1 „ t , , 

t; ; a/. (1996), and CDR definitions coniorm to Ksbat cr % (199.1;. Coirmlem^manty determining eoyom CD]-:],. CDK 
i t ( i t i , 



and CDR3 ar 



Our results (see Table 1 and Figure 2) confirm 

L u< f> OU1U , 1 lu , k f 

1994; jgnatovich rf id., 1997). The V H germfine gene 
usage was found to be restricted to about 12 genes 
from five sub-families, which are used in approxi- 
mately 80% of ail cases. The V H 2 family is only 
rarely used. Only four of the V* geradine families 
were found to be used., and out of these only seven 
genes were used frequently ($1 %). The VX germ- 
line gens usage was found to be restricted to tiuee 
families, which are used m 93% of all cases, and 
five genes from these three families were used 
most frequently (Table 1). We concluded that the 
vast majority (98% of ail V H , more than 99% of all 
Vk and more than 93% of all V.V) of human anti- 
bodies are derived from only five V, ; and seven v L 
families (four Vk and three VX). Although the 
three gemdine genes of the V H 2 family are not fre- 
quently used, we derided to cover all six V H 
families with oar consensus approach, and there- 
fore we included this family for further analysts. 



The strategy of the synthetic library approach 
was therefore to represent each family by one 
representative member, subject to verification of 
the structural consequence of the distribution of 
CDR conformations (see the next section). 

Structural analysis 

Despite their great variability in length and 
sequence, the conformation of the antigen "binding 
loops, denoted CDR (complementarity determining 
regions), have been shown to adopt only a limited 
number of main-el i termed cano- 

nical structures (Chothia el ah, 1939). The adopted 
structure depends on both the CDR length and the 
identity of certain key amino acid residues, both in 
the CDR and in ' the contacting framework, 
involved in its packing. The six V H , four V* and 
three VX germftne families, as defined above from 
the dendrogram analysis, wen thereku . .. 
for the canonical structures of CDRs that they were 
predicted to encode, in order to define the structur- 
al repertoire covered by these families {Table 1>. i 
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Tabl e 1. Frees t . , ■ : ng types of canonical structures 
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the following, we will use the CDR definitions 
given by Kabat et a/. (1991) (see also Figure 1) and 
the sequence numbering according to stntctura! 
criteria defined by Chothsa (ChothJa et si, 1992;. 

on iinsonc/ >l m W Mum e/ <;,' 1^%). 

The structural repertoire of Hie human V H 
soqoonces wns previously analyzed in detail by 
C - otbta rf < ' t ( '' 2 In (otai, three conformations 
of CDR! (Hi -I, Hl-:> snd Hl-3) and five confor- 
mations of c:DR2 P-I, H2-2, H2-3, H2-4 and H2- 
5} have been defined, and the observed combi- 
nations have led to the conclusion that almost all 
sequences have one of seven main-chain folds. For 
the highly diverse CDR3, which is encoded by the 
D and j-ijiin , s and imcoded mtdeo- 

tides (N-region divei-sity), struclwal families have 
been defined only very recently {Morea at ai, 1998; 
Oiiva et al, 1998), but structural predictions are not 



approaching the accuracy seen for the canonical 
folds of five other CDRs. 

All member;; of the V H 1 famiiy encode the CDR: 
eonformahon Hid, but differ in their CDR2 con- 
formation: both tlie 1-12-2 and the H2-3 confor- 
mabon were found in five ^ermllne genes. Since 
these Iwo types of CDR2 confonnationy are 
defined by different mm acids al pos 

itfon 71 icwated in framework 3, we divided the 
V«l sub-family into two further sob-families: 
V H 1A with CDR . H2-2 (alanine at 

position 71) and Vy.lB wuh She information H2-3 
(arginine at position 71). Upon model building (see 
below), wc- c { t tie types into 

the library design and to construct both" a V' H 1A 
and V H 18 master gene (see below). 

The members of the V M 2 family were all pre- 
dicted to have the conformations Hl-3 and H2-1 in 
CDR1 and CDR2, respectively. 



J rth nk n n i" Comb, it > 4 i/twxjy libraries 



VLk 




VU 



VH 




Figaro 2, Cave 



ag« of germ line 
ace ' by HuCAL 
• j T U n ^ t jiKiu i 

a fakf:ri from VBas« 
(nttp://www.oire-epe.oanw:.uk/ 
unt- doc / p u b I k I INT RQ.bti nj ) a rtd 
aligned to the 14 HuCAL 
sequences. The Pbylip (htip://evo 
Intion.gemjties. washingtoaedtf/ 
phyiip.btnii) and Chtsta'iVV (see 
ftp://ftp.ebi.i5e.uk/pob/soflware/ 
mac/dusfalw /) phytogeny pro- 
gram packages were used to gener- 
ate separate unrooted trees for the 
V L kappa, V L iambrfa and V H 
sequences Percentages indicate the 
fraction of re;,! ranged sequences in 
the dajabab-e tliut duster '.viibin the 
different germhne subgroups. For 
these calculations, we used a data- 
base c;j rearranged sequences with 
846 V, kappa, -313 V t lambda and 
.21) V H icqu f , , Mune Tht. 
difference to 100% in the case of 
V L kappa (05%) and lambda 
(76.4%) is due to rarely used germ- 
line subfamilies that are not" rep- 
resented by the HuCAL master 



The CDR1 conformation of the V H 3 family mttm- H2-3, H2-4). In these CDR2 conformations, the 
! ( i d < v f V HS U sx ilt no k t. 1 1 x is d s _ hi ( 

'lit typ« were found iKd fil?-i ivhile S it. I >r r c rfim"li p of < I >i i3 dei it 
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d '35 well as the length vari- 

t , . "There- 
fore, the V H 3 family is best represented by a 
sequence containing the canonical conformations 
HM and H2-3, even though the more groove-like 
shaj of bint f the longer CDR H2 

types; may be introduced later by CDR shuffling. 

The V H 4 family j red cted to con- 

tain three types 'of CDRl conformations; namely, 
Hl-1.. 111-2 and id 1-3. Che CDRl canonical frame- 
work residue 26 was found fo be glycine in all 
cases, and the CDRl loop conformation is defined 
solely by residues located in this region. Since 62% 
a' it t.n m < V ,4 j iu m iv mtnnu) t s ill 
i typo of CDH1, (his; conformation was chosen for 
representing the V M 4 family. The CDK2 confer- 
mation of the V H 4 members was found to be H2-1 
in all cases. 

The two members of the V H 5 family were found 
to have the conformation Hbi and H2-2, and the 
single germline gene of the V„6 family had the 
conformation Hl-3 and H2-5 in CDRl and CDR 2, 
respectively- Hence, in structural terms rise 
ti> used members of the six 
V H families can be represented by sever, sequences, 
since only the V M 1 family contained two types of 
canonic;;.;. CDR folds defined by residues in tin? fra- 
mework region, and since V H 3 and V H 4 were 
decided to be m i ' i 
typo The canonical conformations not present in 
the design can be incorporated later during CDR 
library generation, since the key residues for those 
conformations are part of the CDR itself. 

The structural repertoire of the human Vk germ- 
line sequences was analyzed by TomHnson et at, 
(1995). There are four conforms tioas of the CDRl, 
which are defined by the length of the loop (7, 8, 
12 arid 13 amino acid residues) and tire nature of 
residues 2, 25, 29, 33 and 71. The CDR2 loop of 
human Vk domains is only three amino acid resi- 
dues long in ah cases, and' is predicted to adopt a 
single canonical fold. Most human Vk germline 
segments encode also a single conformation of the 
CDRL loop, which is stabilized by the conserved 
Ds~prohne : 95, bui other conformations in 
1 i dut to the p>n 

cess of V'-j joining and the potential loss of this 
proline residue. Since the * CDR3 region was 
planned ii > be randomized foi library generation, 
this area was not considered for the consensus 
sequence design. Hence, the structural repertoire of 
Vk domains is essentially defined by the confor- 
mation of the CDRl region. All members of the 
Vc] farm!) contained is ven i ssdut CDRl (Li , 
and the n i ntly i hn nib rs of the Vtc2 

family contained a 12' residue CDR] (1,1-4). The 
ei of th v't faxnil ntained cither a 
seven (LI -2) or m eight (CI -6} residue CDRl. Since 
ft mo a tl i fv. \ oillit 'h t ill im 
define the CDRl conformation are identical in both 
m 6d"<> oi the rearranged 
Vk3 sequences contained the CDRl conformation 



LT-6, this type was chosen for the consensus 
sequence, i'he singt. getn lirv member of the Vk4 
family contained a 13 residue CDRl (LL-3). 

The structural repertoire of the human Vk germ- 
hue sequences was analyzed by Williams "et a!. 
(1996). The three families analyzed here encode 
identical conformations of the CDR2 loop. The 
CDR3 loop conformation is thought So be more 
highly variable, as there is some length variation 
and aa cis-proline residue Since this part was 
planned to b t r library generation, 

this area was not considered foi the com nsu 
' ! i. Although the CDRl j n of tl 

> i her 13 irUamutOKi resi- 

dues, st is thought to adopt ;i single conformation, 
seme the canonical key residues are conserved and 
the additional insertion of one residue has little 
effect on the overall structure (Chothia & Lesk, 
m? \ ' st Kn. tl oi 13 a idu v ist h > }S 
found in more than 90% of ail rearranged VCt 
sequences, was chosen for the ViU consensus. The 
members of the VX2 and V,\3 families each encode 
a single defined type of CDR! loop structure: the 
VC2 family encode a CDR] loop of 14 residues, 
and the CDRl loop length of the VC3 family is 11 
residues. 

In summary, from the eight different pairs of 
CDRI-CDR2 conformations "encoded by the Vk 
and Vk germline gene:; that are used frequently, 
seven could be represented by four Vtc and three 
VT consensus series. The renaamum CDR] confor- 
mation (seven residue CDRl loop in the Vk3 
family) >s not defined by canonical key residues in 
the framework region and can therefore be inserted 
Into She Vs3 consensus sequence during library 
generation. From the VI different family-specific 
pairs of CDR1-CDR2 conformations found in the 
six V). E germlii fan es even could h covered 
by hvidii lha family \, I into two families 
(V H IA and"V„lB). The 'remaining four pahs (two 
in the V H 3 and two in the V ir 1 family) were either 
not found in re i equenees or are 

defined by the C'DRs themselves'and will therefore 
have to be created during the construction of CDR 
libraries. Hence, the structural repertoire of 
the human V genes used could he covered by 49 
C V„ x 7 V, ) different frameworks. 



Design of consensus frameworks 

"The compilation of rearranged sequences was 
first divided into separate groups (four Vk, three 
VX and seven V H ) according to the germline 
families described above. These protein sequence 
databases were used to compute the consensus 
sequences of each subgroup. By using the 
rearranged sequences instead of 'the germline 
sequences for calculating the consensus, 'the con- 
se) sus was auf< mat! all hied a cording fo 

the frequency of usage. Additionally, frequently 
mutated and highly conserved positions could be 
identified. 
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e CDR] and CDR2 regions., the consensus 
anged sequences was replaced with the 
%cid sequence of one of the germline 
m; of the corresponding family. This pro 
removes any bias, as the CDSs of 
ed and mutated sequences art? known to 
tee! due to selection towards their particu- 
ens. in the case of V?v, a few amino add 
;s were introduced in seme of the chosen 
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>id structm 
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expression behavior, it was advantageous to first 
ib fie tt ibra t t i R > H and 

CDR3-L cassettes with defmed'dummy sequence. 
We chose the sequences ^QQHYTTPP and 
VVGG DC }'■ Y A MD Y for the V ; and V„ chains, 
respectively, which are. derived from the antibody 
4D5 (Carter &l «/., 1992a) and are known to be 
favorable for antibody folding in E, coli (Jung & 
PliSckthun, 1997). Even though molecular modeling 
indicates that the omega loop from Vk is not ideal 
in a VX framework because of steric dashes, good 
v ession bebavioi could stil 
demonstrating the robustness of the frameworks 
(see below), 

For the framework 4 regions, encoded by the J- 
elements, the consensus of the rearranged 
sequences in each family was calculated and found 

to be identical in ail families of V H and V L (k and 
X). This shows that there is no correlation between 
V-usage and i-usage (Buskin el a?., 1998). In alt 
three cases, this consensus sequence was identical 
with at least one of the naturally occurring 
sequences encoded by joining elements, indicating 
that the sequence is able to exist. 

We have described, up to this point, only 
sequence information !hat was used to design the 
consensus sequences !t could therefore not be 
excluded that die consensus would lead to a mol- 
ecule whose sequence might dump" between 
different naturally occurring sequences, thereby 
creating certain artificial combinations of ijmi.no 
acid residues that are located far away in the 
sequence but give rise to contacts in the three- 
dimensional structure It was therefore essential to 
verify the sequences by structural means. Other- 
wise, the uncritical use of the algebraic consensus 
might obscure a hidden interaction between certain 

- 1 n which can >ccui only in certain combi- 
nariooij Whit, shi: ,pp f ma> also keep resi 
dues together that are linked only historically, it 
does safeguard against losing hidden long-range 
interactions (Saul & Foljafc, 1993). ;V , a fet check, 



j http:/ / www.biochem ucl.ac.uk/ ~ roman/ 
p rocheck /prodieekhtml 



the most homologous rearranged sequence for 
each consensus sequence was identified by search- 
ing again - 

sequences, and all positions where the consensus 
differed from this neares rearranged sequence 
wore inspected (see Ma rial ind Methods) Fun 
thermore, models for the seven V H and seven V, 
consensus sequences were built and analyzed 
according to their structural properties (see the 
next section). As a result of this analysis, the fol- 
lowing residues were exchanged (given is the pos- 
ition according to Rabat's numbering scheme, the 
substitution performed, and the name of the gene 
iamilvV. S H( ,dT (Vfofo g j, ,,A iVe'S), C, f „A. D :il ,,A, 
R t7? S (Vk3) and V\ 78 T (VX3). 

After the consensus protein sequences were 
designed, phviogenetic trees were built with the 
p tran ; PKYi II - and I Inst, iA I .1 lompson 
el r,L, 1994). For this representation, we repeated 
the analysis of germ line usage based on an 
updated database of rearranged human antibody 
sequences that was more than twice the size of tire 
original database that we used for the design of 
the HuCAL sequences. Separate unrooted "trees 
were built for the Vjc, V t ,X and V H sequences 
(Figure 2) This malvsi illustrates the strasefn 



ich 



> F n 



each . 



c By 



tempt 
on of 
inker for 



; sequence as if it were a mem- 
ber of the germfine.. its position in the sequence 
map is indicated, and that it truly represents the 
family (Hgure 2). 

Molecular modeling and analysis 

To obtain more information about the packing, 
CDR conformations and framework properties, all 
seven V„ frameworks, ail tour Vk frameworks and 
the three VX frameworks were built imi homology 
modeling i 'la„ i eompieti -truetum! alkm- 
••*'• - '.dependent am,- 



ed out 



the 



. rhe 



legend to Figure 3, Usually., the temp 
highest resolution and the fewest mutations rela- 
tive to the consensus sequence to be modeled was 
used. For ail models, multiple templates were com- 
pared, such that the effect' of mutations in any of 
the templates could be evaluated directly from' the 
structural alignment. The experimental 'structures 
d splaying the t 1 i ding to each 

of the HuCAL constructs" are listed in Table t of 
the Supplementary Material, 

In the models (see Figure 4), the dummy CDR3 
sequences from rhe antibody hu4D5 (version 81 are 
shown {P08 file IFVC). All models were checked 

itl It | i i im PROCI1KT M i t< 
Laskowski el «? , 1993; and were shown to have no 
more residues in the 'ess favora lie i 
Ramaehandran plot than the template structures 
(some unfavorable torsion angles in loop regions 
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ft i i at the HJ 

CDR2 in V,,), as well as having no obvious cavity 

nusual exposed hvdroj r 
set of standard variable domain hydrogen bonds. 

Consistent with sequence considerations, the 
great majority of canonical structures was pre- 
dicted to be present by mode] building, when corn- 
paring the critic:.) residues with the templates. 
More recent work (unpublished results),, based on 
previous experimental observations from X-ray 
crystallogra » i 1993) and muta- 

i> it j ) mm vi v d sev- 

eral tnort st uct relat hi v ifhin each V H 
domain, which may to contribute to diversity. Par- 
ticularly, re! r : ween the nature of the 
residues H6, H7 and H9, due to ttie different 
hydrogen bonding pattern of H6 to the backbone, 
can transmit a conformational change through the 
protein via resi i d82, B67hand H63. Oca 
analysis showed that ail types of conformations 



rify 



latum 



fra 



1. the 



s tra- 



in die Vj.,3 gtoup of germ tine sequences, there is 
more variation in CDR2, because of the length 
variation of a two amino acid residue insertion 
occurring in a group of human sequences (pos- 
itions 52b and c). These antibodies, might form 
more cleft-like binding- pockets, and litis diversity 
is not present in the original library design, even 
though many other combinations of frameworks 
would be able to form cavities and clefts as well. 

; i ough the modular design, however, these long- 
er CDR2 elements can easily be introduced by cas- 
sette mutagenesis. 

An analysis in analogy to that reported by Nieba 
' ' 1 i • 1 - exposed residues al 
the V/C interface arc already of low hycirophobn 
city in al! consensus frameworks, consistent with 
their superior expression behavior in S. colt (see 
below). Moreover, many of the residues identified 
as crucial for stability and clearly selectable by 
phage display, such as P Ui (defining a conserved 
rst 3-strand with a cfe-peptide bond in 



Vk 



■ in 



'oline 



t po 



ttions S and/or 9 in VX domains, see Spa 
1998) are present in alt i U equences Residue 
R HMl , which is; part of a conserved charge cluster, 
end frequently K in murine antibodies, where it 
leads, io lower stability (see Proba el at., 1998), is 
where the 



fot 



woke 



All : 



idue: 



•cha 



bonds are present i is Side-chain 

to side-chain: R H38 to CJW, D m6 and Y H ^ R H « to 

Q 6 ti -> W Ql» (k* & vk2) 

' m [>, -ed, d> t miuuhiii 
R W5 to H s ,,; T H «, to X„ M ; 0^ to 0^ to X w ,. 

tm'Jllti ii u, to i j 

D H ;-s X HS; to D„ SB ; X Hi) , and X H5W to £„« or CW 
>m,n to T HS7 ; X w to D 1Jsa ; and X uo , to Q w . 
Inierdomain: Q us to Q H39 . In this listing, X refers 
to potation- wi « t i > . t r j \ , t , 



The relate-, e ■ t ti { V, with « pect to V», 
is still understood only poorly , rand will depend on 
the exact pajrwise combination and on the specific 
CDR3 sequences. Frequently, monoclonal anti- 
bodies are found with mutations within the inter- 
lace. This introduces further uncertainty in 
building a model of the combining site, because a 
smalt deviation in angle can have a large effect af 
the top of the binding site. Tins variability of the 
relative orientation of the two domains is particu- 
larly large for Vk domains and Vk- lacking the ci's- 
Pro in position 1-95, and is further modulated by 
non- tyrosine residues in position 1,49. The "elbow" 
of ordinary Vk CDK3 inserts around L96 into a 
notch in V B and restricts the flex .. - 

face. Since the interface residues are highly con- 
served between all the consensus antibodies (see 
Figure 3), and since very similar frameworks are 
available as templates hi the database, more 
reliable modeis may be possible for HuCAl. anti- 
bodies shun tor antibodies further no/ay irons the 
consensus. Ties system of defined frameworks 
might, in addition, provide excellent access to 
studying Shis question of domain orientation 
experimentally. 

Construction of the seven V* and seven V* 
master genes 

The final result of the analysts described above 
was a collection of 1 4 ammo acid sequences, which 
represent the frequently used antibody repertoire 
of the human immune system. These sequences 
n b i ' ' ' nio tolsK sequences In 
a first step, the back-translation was earned out 
using only codons that are known to be used fre- 
quently in £. coif, in a second step, these gene 
sequences were then examined for all possible 
restriction endonudease sites, which could be 
introduced without changing the corresponding 
nnim icid seq no This was d >o< 1 , « am i 
database) of ail possible silent cleavage sites tor 
each gene. Usu sse, dea <o sites wen 

selected that were located close to the COR and 
framework borders and that could be introduced 
into all V' K , Vk or Vk genes simultaneously at the 
same position. This was considered essential io the 
overall strategy, as CPRs (or frameworks) can then 
be shuffled within pools of sequences, without 
eveis knowing the individual antibody sequence. 
In a few cases it was not possible to find a cons- 



ole:; 



in 



of ti 



Seel in the 
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lyzed again after 

In total, six amino acid residues were exchanged 
durmg do Jo f n< 1 , _ <• ->, o 

(V H 6), p iJ i.) ri;n( f tl5s v (Vvtog K l21 }{ (VV4) and T,,,S 
(VX3) A< 1 resi- 

dues of ail throe VX sequences were changed to 
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aspartate-tsoleucine in order to introduce an &«>KV 
site common to a!! V t genes. After this design, 
only one el ern , it > email vhere no com- 
mon cleavage site could be found. For this region 
(the border between CDR2 and framework 3 in the 
V H sequences I il - t wpes of cleavage sih-cs 
were used ir for V„1A V„1B, V H 4. 

and V„5; and NspV for V f) 2, V M 3, V H 4 and Vh6, 

During this analysis, several potential restriction 
endonuctease sites we Sent It <t could be 
introduced into even* gene of a given group with- 
out changing the amino acid sequence, bur which 
were riot iocaAd at the flanking regions of the 
CDR or framework elements. The introduction of 
these cleavage sites made the system more flexible 
for further improvements. Finally, each gene 
si qui ce was rnodifi tin to > m vith the 
exception of the common restriction sites, all but- 
one of the other sites (with a length of the recog- 
nition site of five or more bases), since this unique 
site might be used as a "fingerprint site" to differ- 
entiate the genes by restriction digest. Ail these 
changes were again earned out without changing 
the corresponding amino acid sequence. The 14 
final protein sequences, including the introduced 
restriction pattern arc shown in Figure a. 

The resulting consensus protein sequences were 
finally compared to the gervnlioe sequences, and a 
mean deviation of all 4i> consensus sequences from 
their closest germiina counterparts of 4.9(.-fe3.6) resi- 
dues was found. Thus, these consensus sequences 
are, on average, much more related to thegermhne 
sequences than the majority of rearranged 
sequences found in the database (mean deviation 
14.7 anafto acid residues), in contrast to the "orig- 
inal" germline sequences, however, our synthetic 
versions have all the advantages of sequences with 



known md pr tique restriction sites at 

the framework/ CDR borders. 

The consensus gene fragments were then 
assembled front oligonucleotides by SOE-PCR 
assembly (see ' >ds ! >r details) 

Gene segments encoding the human constant 
domains C }} 1 (sub type fgCl), Ce and CXI were 
designed with optimi; --age and syn- 

thesized in order to create 1 fi jments foi dis- 
play or expression (see Materials and Methods}. 
After synthesis, the gene fragments were 
assembled and inserted ' individually into the 
expression vector »BS i w single-cha t 

Fv genes conta i in\ lent a d u my * H and V t 
CDR3s. The general format of the scf-v genes is 
shown in Figure 5. AH 49 master genes were also 
cloned in the reverse oriented scFv format (V L -V H ) 
as well as in the F ab format for future libraries 
(data not shown), 

£. eo/f expression analysis 

The £. coli expression of the 49 scFv genes (nil 
containing the same V M and V L CDR3s from the 
mhbuii) bu4Db - f ( ntu - ?/ i 992a} was; ,fu 
died similarly as described by Knappik &• 
Pluckthun (199S). We found that all 49 master 
genes could hn ojirosu^ a-. « b hfo, n ten v U.i 
periplasm of B, coli, yielding * band of the correct 
size in FLAG Western i . a le E, ceil crude 

extracts (data not shown). This indicates that all 49 
combinations are most likely capable of forming 
V'm/Vi, pairs,, since unpaired domains tend to 

<qm it v , i ' , i 1 i ; s < 
The ratio of soluble to insoluble expressed pro- 
tein was quantified from Western blot experiments 
for each scFv gene, since this value has been 
shown to be correlated to the expression behavior 



Figure 3. Protein sequences of the fluCAL V„ and Vg master genes. An alignment of the seven V, and seven V„ 
i t h jilt t lot i im 1 i O 0 it j k r r 

iii! the eon ng log k t t j t i i ; i 1 t ] ' ceding t 

sa n criteria <,umi hy CAoihin rt .■;! "m 1 w i:i i i , i i ^ si end ttij u ■■ i ' W H , tic H3 i ig 
g n i- '< 1 m Id UTi aim I wore recently, the extended Ho loop has been defined to 

include residues 92 and 104 i* 1 , . A i > > M CDRs are according to Kabst f .<?•', (199"!). Color n 1 an it 
i - 1 i ! i ) i t i t i i , in 
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' ■ i rin ti n nuur o \ „u teu t > it is [I s -u , ■< i I, mI I t i u t S 

i lit i ma , k:mi-bimc > t u « V- than 50 > t to side-ehAi urrao is sol 

vent accessible) are additionally marked "by b t i The average loss of sides-chain s i t i s tt! ;>,ir;ace ypon 
formation of the ' , 'V M dimct interface, indicating it:sieues directly r i 'mi k to the doner interface. Positions, 
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face upon forma r ion of the Vh/Ch and VH/CH Interface in the Fab fragment. ,i"e) Average toss of side-chain soivent- 
t ti i n inn te.ntiv P , it Or Out gl, on i i, t, ! , 1 , 

1 i dt ! i k ut i itu j n mi I t i ' '!ustli it.ni 

(niore tl m itt , nt .'In* num.- I wt - » a . , la t mtitt, in the interface) are additionally marked by i. 
(f) Averag < 
www.rcsb.oig/pdb/) from the average C* positions. 





! ur 4 If nge of coniorni^tion.x M t f f i y the Hu< ram I 

1 1 1 ■ 1 * t[ * 1 ^ ii • 1 f' ' JU 1 . I t t (. UM!« lil t > f 1 11 '[ t [ Uf If d (, 

;;;° f!! ^ r, f'y. i.-opoiyroiT i l^i^ Pao Diego " as . i 1 ,. ^ in ' it ml m Jf dn d\ fcr 
I v t u i i t t i It t I i ,n u in J 5 \ 

'^ t 1 (till ' ( Hf Klut M 

z> UM tolUO, H44-H50 H'>7 H, . il'Mittt f 
HKte for V n (>n*««cd m vvhnep l-or cowpanson, 100 non-rMuridanf. V,. end V H structure-;, (swu:,,: snd human* 
were t A f t Ll the U i( protein t i t i database 1 t j i is t I j ;,!ign<?d. ( HuCAd V, 

model* and (b) > tu «. kappa chains; Hue, kaj s 1 g oM J ro J n I t 

iamlxh chains, ;c} HuCAL V }( models and id) X-ray structure color-coded according to the sequence pattern corre- 

<g with tj r k stnifti r r, n •, H6 rrr Ok Hy — 1* pusi, II, Glu H9 OT i 

1 - 1 ' i 1 he fourth conf rmation n t covert ! the Hu< M mod ) 

some correlation with the pio r.ie t I'm m jm ti » IP, whuh * iei> rare in hen in sepw m } „ hdf in 
<i>i sequ ah< i f,/ the sequences) 



of antibody fragments {KnappJk & Pliickthuo, 
W3; Nieba « />/., 1997; Jung & Pfdekshun, 5997), 
i« *«ch separate expression experiment, fhe 
HuCAL H3k2 master gene W3S included as an 
internal control. The results are given in Table 2, 
The HuCAL genes were found to show a higher 
ratio of soluble to insoluble protein ibsn many 
antibody genes obtained from natural monoclonal 



antibodies and $« seqi ru! < t < in t' te/i 

The ratio of soluble to insoluble protein ranges 
born 33% (H,a2) to 90% (H,A?T and ujl), 
whereas a wide range of ratios has been found 
from natural andboc; fragments, including many 
with ratios much be > undei irmiar expet 

n . ' . , i t . • , - v r 1 . 
et «/., 199?). We could not find a correlation 
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Figure 5. Arrangement of HuCAL scFv in She V H -V t orientation. The scFv gem cassette is preceded by a phoA sig- 
ns*! seouorioe and a short FLAG t.ig. The two domains, are fused by a 20 amino acid residua flexible JinW. Some ot 
the unique restriction sites comimn to all master genes are shown, and die location of the CDRo regions is inrliesteti. 



betv/a n tin t pe id V, gene end expression beha- 
tor of th 1 r e 1 ! * tf t-t med 

that the 'e os £ li th< V H 3 or V H IA domains 
are showing higher soluble to insoluble ratios in 
almost all combinations (Table 2). These initial 
findings dearly need to be extended by 3 more 
detailed bsoph 1 1 at terization. 
The amounts of soluble protein produced, when 



t to the 



\ id; 



hot OHO 

that did 



t thi 



gene sot as ildh, ranged 
dita not shown), indicating 

yields for ail combinations. 

.. Although we must expert 
CidKs after randomization 
may influence the range of 
with the master genes, the 



and selection of i 
expression yield: 

use of weil-expressea tramewortcs tor creating 
libraries • increases the chance to select well- 
expressed binding antibodies and reduces the large 
imbalances in the display efficiencies. 

The CDR3 sequence introduced as dummy 
sequence in ail V f , genes was taken from a \\_ 
kappa gene {see above). Since this Vic CDR3 con- 
tained a ris-proline residue at position 95, creating 
an omega-loop that is normally not found in V L 
lambda CDR3s, and which might influence the 
folding and hence the expression behavior of the 
corresponding scfv genes, a V\ dummy consensus 
CDR3 cassette encoding the sequence ^QSYDSSIS 
was designed and used to repiace the Vk dummy 
CDR3 in the M3?di scfA gene interesting!) how- 
ev er, no s gn i a irt expression yields 

could be detected (data not shown). 

The expression behavior of two randomly cho- 
sen scFv genes (112x2 and H3k2) was analyzed in 
more detail. These two genes were selected from 



panning experiments after library creation (see 
below)' and therefore contained CDR3 sequences 
different from the dummy sequence of the master 
genes Since both seFt ( t ' unci tht mti- 
gen they were selected on, we could use ELIS.A 
experiments to determine the amount of active 
material in the iysates aiter different times of 
induction. The results are shown hv Figure 6. We 
found that the expression titer after five hours of 
induction at 3D "C was 6 nig (H2>:2) and 10 mg 
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cell .ieakiness. This observed espiession yield is sig- 
nificantly higher than that reported for antibody 
'.no r 3 be ma t irtfiiths c! I 194 

Vaughan et ai., 1996). 

Design and construction of CDR3 
library cassettes 

Ow rational approach to creating an antibody 
library anus at Jef i - i itlest number 

of molecules possible, a structural diversity as 
large as possible- At the s; 
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i import- 
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urthermore, it was essential to 
ice diversity to those residues 
act with the antigen. We decided 
tor the tirsi set of :i : 5 to randomize 

both CDR3 regions of the V H and V L genes simul- 
taneously, since these two regions form trie inner 
circle of the antigen binding site, and therefore 
show the highest frequency of antigen contacts in 
structurally known antibo tj mt gen cc mpiexes In 
order to obtain the highest degree of diversity in 



*'Jha amount of soluble fui!-k-)igth stK retebve to the total awjent «btsineti is given fin %). as dalerniiued from qiiatitaistive Wes- 

! a i 1 ' > 1 ' k, rhc H3k i 

served as an internal control m tfa d» separate expression experiment was anaiyaed altogether 18 times, For the; gene, (he mean 
1 i 3 i 1 . 
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Figure b. Crowsh curves and expression kinetics for 
v < .ut,j f.-s m the 
' Hoc U frame w» k; (b n« derived from the 
H2k2 KuCAl. framework. The growth curves (circles} 
were determined by measuring the absorbanca at 
600 t m t th< t h ■> - ts. For com '"if- t 

the growth cwvi Ih i luced culture (open circles) 
is give, i if t > me point ol 



the I !!•.■• 



■frag nil 
A the pre 



EI, 



of know 

of i 



; i - i en'tu a tenia! I ! 

.e the scFv amount based on thy B'LISA signal 
d lh< a i in <4(iJ ' t f < ion of three 
U measurements is given for cadi experiment. 



the V„ CPR3, which is also the most variable 
region in natural antibodies, we applied the follow- 
ing strateg) it" Ubrar> generation first, we 
designed V, CI f ngly biased 

for tJ 'i v i i+ u d iistrihution of amino adds 
(.see Mow) with relatively Sow complexity and 
insetted those in the V L master genes, aiming ai & 
library size of about 1{S 7 members. Subsequently, 
we used these V, libraries to insert a V H CDR3 
library cassette with very high complexity (both in 
terms , 1 i i t k a t it 



at f, ensu rr\ member 

» unique 1 >1 jence. 

Since w< u in; i I i al . 

1994) for the generation of foe CDR3 library cas- 
ette sa beli * tve could introduce any lunmt 
idd bn> t ; j it u 5, it! i t , t V», 
da ided to first n 1 »ua Ik in 

the CPka r.,<nn- v tK! diiab^se, ■ hi n it 
reat ringed mtit equences and use this infor- 
mation 1 ( tin vith t Para t data fo I 3jbi 
at ign in rder to b . s ihi U d in 
towards the naturally found human antibodies. 

Vk CDR3 

A total of 382 sequences of rearranged antibodies 
from our initial interna) database were analyzed 
In the- IjV>m- -Tii- m we u.il use the nun 
boring system and definitions of CDRs regions 



that 



do« 



not 



d io the structural definitions 
1987; Barre si al, 1994; Qudiceiii 



(Chothia & I 
ei Hi, 1997). 

A fraction of 72.3% of all CDR3s had a CDR 
length of eight amino acid residues, the remaining 
sequences had CDS lengths of less than seven 
(1.8%), seven (7.3%), nine (17.3%), and ten 0.3%) 
residues. Because of the predominance of CDRs of 
eight residues, we decided to consider just that 
size for constructing » CDR3 library. The omega- 
loop structure of Vk CDR3 is determined by a 
characteristic ris-prohne residue at position "95, 
which is encoded in 96% of ail * gcrmline genes, 



t opo 



cur am 



total 



r c to res 



g available for : 
2 (Al-Liv/akant ei al, 1997). in eanot 
residues 90 and 9; are predominantly occupied by 
glutatnine and proline, respectively, whereas in 
structure 2, the presence of ds-protine at position 

94 is characteristic. About 87 % of ail 382 sequences 
had Q,. (X! , and 78 % had P t . 9S , whereas P w . } was pre- 
sent in only 1% of ait sequences. Therefore,' we 
decided to base the design of Vk CDR3 on struc- 
ture 1. Besides the canonical residues,, position 89 
showed a st >ng < r i, wit jjutr mine pre 
sent in 89% of ail sequences Residues 89 and 90 
are not part of the region outside- the fostrand 
forming Hie CDR-L3 loop, which comprises resi- 
dues 91 to 96 (Chothia & Lesk. 19S7). Within CDR- 
L3, a high degree of variability (except for position 

95 mentioned before) can he seen, with some pre- 
ference for tyrosine at position 91. This corre- 
sponds well with the inspection « 
residues in structurally 
complexes, showing that . 
seem to play the most important i 

In our design of the library (se 
1 * ; l i ' , fit. lies 

nical residue, the side-chain of this gfutamine resi- 
due does not contribute to the antigen-binding 
pocket, but points in the opposite din-enom in trie 
trinucleotide mixture, we biased positions ffl and 



antigen contact 
antibody-antigen 
a 94 and 96 
Figure 3), 
re 7(b)), we 
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Hj>ure ' c r. insert between !esien ,-ied e.eperii 1 ompos < 11 > > ides, used he • a I posits i of 

1 " nwoui.ln;) k.AlU 1 I f 1 1 L'R th< } ,it>i. ill flllll . t nfv, ' 

ti'i utable reg n i numbered from H95 to H is) »e amino acid cornpositiot in the planned <i 

<T, Mt Ar ' v it i th> in u ill I i ,u ii i ], r. ef tt 1 Kill 1 r res 1 ( ( > t 

• ' • " ' i i f I , i i ! tl' i ni 

of the Supplementary Malarial). Occupied indicates the number of amino adds encoded by the respective mixture 
and found in (he sequenced clones,, tespcxiivety. 



95 strongly towards glu famine and proline, 
respectively. A limited set of trinucleotide codons 
was allowed for positions, 92 and 93, despite the 
fact thai a large number of different residues can 
been found there, because the side-chains of these 
residues point away from the V H CDR3 contact 
side. In contrast, for position 91, 18 amino acids 
(all except cysteine and proline) were allowed 
(biased towards Y W) }. Since proline is never found 
at position 91 in germline or rearranged sequences, 
it amid be that P [91 wouid not allow the loop to 
form the correct conformation. Cysteine was 
omitted, since it was almost never found and it 
might cause problems during phage panning and 
later expression because of disulfide formation. 
Accordingly, for positions 94 and 96, all amino 
adds except cysteine were allowed. The residues 
located at those three most strongly randomised 
positions point into the binding pocket. By focus- 
irtg the diversity towards positions that are most 
likely in contact with the antigen, we could reduce 
the overall theoretical diversity to 3 value of 
1,3 x 10*, which ensured that the theoretical diver- 
sity » ill be pi < - 1 i . rial Itbrar} 

genes, three trinucleotide-coniaining oligonucleo- 
tides were synthesized. A single oligonucleotide 
for Vk! and \ * \ was > > 11 ' nun both d tier 
only at position 85 (Kl T^s k3 V^} and could 
thus he nth. f ' > ng a mixture of tw > tri- 
nucleotides ane< ling th ( nine md valine m a 1:1 
ratio at th« pptoj t 1 uctural inspe 

Hon revealed that residue i>5 has no contact to 
other residues, thus making it likely that an 
exchange of these two similar amino acids would 
, e found the expected 
hi ratio at titts position after library construction 
and sequencing of clones (data not shown). 



For oligonucleotide synthesis, six different trinu- 
cleotide mixtures tTs'2 to Ttte see Figure 7(b)) had 
to be prepared i i r two to 19 codons. cithci 
biased or equal h distributed Wbih initial 'results 
had suggested that different trinucleotides couple 
with different relative coupling yields (Viroekas 
et 1994), more controlled subsequent exper- 
imentation showed that these differences were not 
systematic {data not shown) and thus, trinucleotide 
mixtures were prepared directly using the desired 
molar ratios, thereby implsciiely assuming an equal 
coupling yield. During oligonucleotide synthesis, 
the stepwise coupling ratio for trinucleotide mix- 
tures ranged from 95.5% to 97.5%, the overall 
yield per oligonucleotide from 44 % to 68%. 

After cassette propers tion, restriction digest and 
purification, the cassettes were ego ted into the four 
Vk consensus genes using the unique restriction 
sites BfesJ and Msd, and the ligation mixtures were 
electroporated into £. cost TGi cells. We obtained 
6 x 10* independent colonies, and hence an almost 
complete coverage of the theoretical diversity. The 
quality of the cassettes was then checked by 
sequencing 235 independent clones. A total of 175 
clones (75%) were completely correct and showed 
the library composition as planned. Four clones 
contained an unplanned amino acid at one pos- 
ition, which was most likely due to single-base 
mutations introduced during cassette prepare lion, 
three dories contained a one-base and six clones 
contained a one-codon deletion in the trimicleo- 
tide-encoded region. All other non-correct clones 
had the library cassette inserted twice or hi the 
reverse orientation, or they contained one-base del- 
etions in the 5' mononucleotide region of the oligo- 
nucleotide. In order to obtain more statistical data 
on codon incorporation, all codons originating 
from trinucleotide positions were analyzed. 
Figure 7(b) shows the result of that analysis. Over- 
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.-,><! . nt with the 
" Y m , and 
planned at these 



eieetropora ted into 
3.7 x ltf> independei 
As described abov 



s checked by 
;s). Again, abot 



;. Tht! 



Dl% 



eight . i t u res i ! , , , ; 'i to eight 
according to Cbothia & Les-k. 19B7), Therefore, we 
decided >o construct a CDR library comprising 
these ir fferent tei inf s of 

the amino add composition in the rean • I 
sequences revealed a high degree of variability at 
I tons 93 to 96, and to a smallei nr at pos- 
itions 89 to 92. The inspection of antigen contact 
residues in the case of an antibody of canonical 
structure 1 (Chothia & Lesk, 1987," see Figure -3) 
rwealed that positions 91, 94, and 96 scorn to play 
the most important role. A single VX CDR3 oiigo- 
nucleoside for all three VX consensus genes was 
desired, where Q w and S,«. were kept constant, 
- - positii ' 



miariy. 



eel 



t that position and because its side- 
chain points away from the binding pocket. Resi- 
due 91, which packs against V M CDK3, was limited 
to the three most frequent amino acids found in 
the database (arginine, tryptophan and tyrosine}. 
At positions 93 to 95B, an equimoiar mixture of ail 
amino acids except for cysteine and tryptophan 
was allowed, since cysteine and tryptophan were 
never f'oi nd n the rearranged ices. Position 

96 was completely randomized, except for 



the 



add ■ 



aclec 



• iOJ 



tea ddi 



trinucleotide mix! had) - prepared compris- 
ing three biased codons, IS or 19 codons (in both 
oases equal!) d n't f i) The three mixtures and 
their positions in die CDR3 are given in Figure 7(c), 
On average, the stepwise coupling ratio for trinu- 

ot mixture at 98.9 % the oi erali 

yield for die oligonucleotide was 80%. During 
oligonucleotide synthesis, we used (out consccu- 
efcn c « at th< triplet 

position corresponding to residue 55A. Thereby, 
wt create i an too icl tide wit) i bl< lore! i 
covering CDR3 lengths between eight and 11 
ammo acid residues, with the smallest fraction 
having- a CDR3 length of 11 residues. The theoreti- 
cal diversity of these length variants ranged from 
j.3xli esklues) 

Att-r-v. ttn. , u , . - - J .esland 

f unt ation ti - i 1 into the three 

VX consen - s < sing the unique restriction 
sites Bb$\ arid Hpal, and the ligation mixtures were 



distribution, except for which w;s over-rep- 
resented at the expense of W t ,. (see fa-aure 7k!), 
The i n tl i tribution \% > so naiyzed v 
found that the majority contained a CDR3 lenglh 
of eight (36%) or nine (42%) residues, the rest had 
a length of ten (21 %) or 11 (2%) residues. 

v H com 

Por the highly variable V H CDP3s, ail available 
rearranged sequences were grouped together, irre- 
spective of the individual sub-families. A total of 
972 sequences were analyzed.' The analysis 
revealed thai onlv position HID! is siombv biased 
(toward aspartate in 82% of alt cases). This is in 



ith i 



ad in 



1 fot 



die "kinked base" (Shi rat el nl, 1996} or "bulged 
torso" (Morea et si. 1998) structure of the CDK3 
loop. D M)9J was therefore kept constant, although 
this limits the structural variability to only a subset 
of CDR-H3 conformations, as other structures are 
seen in antibodies devoid of the R„* r D H , r ., salt- 
bridge. 

Again, die observed variability corresponds well 



: po 



odv, 



:ure 7 for HCDR3 position 
\ H I 02 was found not to be 



3). 



When desigt t.t ve d.s ded 

to base the composition of the trinucleotide- mix- 
tures for all positions except for HI 00?- and H102 
on the overall amino acid eoinp o '; - n 
ral heavy chain CDR3s. Positions HlOOz and H102 
were analyzed separately. This resulted in three 
different codon mixtures, named TH1 (for H95 to 
HlOOy), TH2 (for HtOOz), and TH3 (for H102). The 
compositions of these mixtures are given hi 
Figure 7(a). 

Analysis of the length variability of CDR 3 (pos- 
95 to 102) showed a range between four and 



adw 



aih 



. let 



: 13.0. 
of 11.6 



thesized using the sub-stoichtometric coupling 
approach to create the shorter library CDR3Ha, 
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comprising ft e to 2? isoio uid the longer 
library CDRTJ i com] rising nine to 28 residues. 
Since the twe i.h . ..• nte were kept separated 
during library construction (see below), their use 
might be adapted to the antigen in question. More- 
over, by mixing these two libraries appropriately, 
it is possible to 'mimic the natural length diversity, 
he una) yields for lig $ t DR3Ha ind 

CDR3Hb were 68% and 74%, respectively, and the 
siih-stotchiometrir coupling rates varied' bet-ween 
35% and 55%. Based on these coupling- rates, a 
theoretical length distribution for the two libraries 
CDR3Ha and CDR3Hb was calculated (see 
Figure 8). 

After cassette preparation, restriction digest and 
purification, the cassettes were inserted Into the 
scFv libraries already containing the randomized 
V, CDR3s described "above. We mixed all four W 
and all three VX libraries before HCDR3 insertion, 
but we kept the V,., consensus genes separate 
(except V„1A and V„'IB, which were also mixed). 

Ftenoe. .4 la-paoiK hbi. tries won- TOttod ('v',,1 to 
V H 6, each either with four k or three .1 genes, and 



each either with the short or the long HCDR3 
cassette). After electroporation into £. coli TGI 
cells, we obtained altogether 2,1 < 10'* independent 
colonies. 

The quality of the V„ CDR3 s were checked by 
sequencing 257 clones, in Figure 7(a) the amino 
acid distributions for the trinucleotide mixtures 
TH1, TH2, and TH3 are given, showing again an 
excellent agreement with the calculated and 
designed frequencies. The sequencing results 
obtained from both V„ CDR3 ' length" variants 
revealed that both library types follow a Gaussian 
length distribution, with the maxima at 9.0 and 
16.6 residues (Figure S). Thus, the actual length 
distribution was shifted towards shorter lengths 
when compared to the theoretical length distri- 
bution, but the whole range of naturally occurring 
length variants was covered by the two library 
variants. 

The final library was designated HuCAU. 
Altogether, we found the fraction of fully correct 
library members with CDRH3 and 13 as designed 
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HCDR3 Length 
(Position 95 to 102) 

Figure 8. Distribution of COR3 H. length variants in 
the HuCAU hbt nos The i ( ' > > (a) the tnnucteo- 
ticfe casset e I < nil the cassette HCK' b 

are shown (fc < nnst a ej to the tength 

distribution as calculated from the substoichtometric 
coupling (gray columns). For details, see the text. 



Diversity and binding constants 

_ Phage-display as well as rifaosome-display selec- 
tion experiments were performed against a variety 
of antigens, including j > ti tes, or whole 

ceils. The HuCAU bi ing ah 49 com- 

bination a used for - 1 t , * i i 
■ '1 re pai r< ng t i \ds> of phag lispfay, or five 
ot six rounds of rihosome display wore performed 
in each case. After the final round., the selected 
scPv genes were subcloned as a pool in an 
expression vector and the transformants were 
screened for binding; using El. IS A or FACS assays. 
Details about the >el i 

characterization of binders will be given elsewhere 
(Xrebs e! al., unpublished results', Hanes ft eh, 
unpublished results), in the great majority of cases, 
many different scFv fragments could be identified, 
which bound the antigen specifically. The V M and 
V,_ framework usage for the first 250 specific bin- 
ders selected from HuCAU via phage display is 
given in Table 3, All V H and \fo frameworks could 
be selected, and so far 42 of the 49 framework com- 
binations were found to be used. While the V„4 
gene segm< \t rarely used Hie ' , > c* it segi tent 
predominates. The predominance 'of' V„3 occurs 
also in nature (see Table 1} and » even higher in 
other libraries (Griffiths nl si, 1994: Vaughan el r,L, 
AFT). Ail other hfoCAL frameworks seem to be 
used with similar frequency. There is also a con- 
siderable variation in V„ CDR3 length: the first 250 
specific binders range from four to 24 residues 
(data not shown). 

Selected binders were purified to homogeneity- 
usmg affinity chromatography or IMAC, and their 
monovalent I tants vers measured 

using surface plasmon resonance (UlAcure). As 
shown in Table 4, binding constants of peptide bin- 
ders were in the micromotor range, whereas affi- 
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Oor j of the 42 i-iuCAL h«,.i m ( | < „ ibn, ,r , j tt t number ji ; . n. ■ I •. irom ,-, collection ( 230 ht ,1 r . .nl about 

' • [ t it i - ' it Jt i i i, i 

mtmxk was <!«crmirK<d bv *qw<meing. 



itti. \ !<> {. u Uiit a t»>! 1 we-" »n<dl" in I ho in* 

Discussion 

H«wt, we describe the realization of the concept 
of fuJJv synthetic human antibody libraries, desig- 
nated HuCAL, wluch are built on seven V H and 
seven V L consensus frameworks, yielding 49 com- 
binations in total. 

We have extensively used these first libraries for 
the successful selection of highly specific binders 
against all kinds of antigens, including haptens, 
DNA, peptides, and proteins, including cell-bound 
receptor antigens (tmpubfohed results). Intrinsic 
afiiiiitK-K down so the sub-nanomolac range were 
.am ' !;>!.> -i protein i i •„ ens, and the majority of 
binders were found to have dissociation constants 



between ! and KX10 nM after only two rounds of 
selection. All frameworks have been selected, the 
selected antibod i n to be expressed 

in good yields, they are so prising!}' stable against 
thermal dermturation, and can be used in typical 
applications like ELISA, immunoblotting, FACS 
analysis, immunoprecipitatwr. or immunohisto- 
chemistry even without any affinity maturation 
steps, verifying the successful design of completely 
synthetic human antibodies described in this 
study. 

Strategy of modular design 

The 49 consensus genes were derived by a step- 
wise analysis of human antibody sequences. First, 
the collected sequences were grouped into families 
according to sequence homology. Second, the 
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usage t 't & ru was analyzed by cal- 

culating (or each rearranged sequence in the data- 
base the germltne gene from which it was derived. 
Third, the families of frequently used antibody 
genes were analyzed in terms of structural diver- 
sity of the antigen binding loops, following the 
concept of canonical CDR conformations estab- 
lished by Cbotbia and co-workers (Cfiothia et (th, 
1989). fourth, consensus sequences were derived 
from the rearranged sequences, and grouped into 
families of frequently used human antibodies. 
Altogether, the analysis resulted m seven V H , four 
Vk and three V7, consensus sequences, and our 
analysis suggests thai this smalt set of consensus 
genes covers almost the snore structural repertoire 
encoded in those human antibody germ tine genes 
that were found to he used during the immune 
response, 

Reducing the human antibody repertoire m -19 
distinct Fv frameworks, yet Without reducing 
structural diversity, made it feasible to obtain the 
sequences -:k now by gene synthesis, which 
enabled us to incorporate several features into the 
genes that facilitate library construction, affinity 
maturation and £. coii gene expression. Moreover, 
the separate construction of the genes and the 
vesu im libraries alto t ' snaivsis of ea< 
master ' framework under defined 'conditions, 
which is not possible with antibody phage-display 
libraries derived from natural sequences by PCR 
cloning, Pa; tn i ri u - s< nee of unique restric- 
tion ; tes across the eh le lit u not if p ible 
to shuffle CI H\ and tram- wo ks even at the le< ei 
of pools, and without knowledge of the sequence 
of the antibodies. Furthermore, the approach is 
modular and can incorporate future knowledge of 
antibody structure, folding and stability, as indi- 
vidual framework pieces can easily be replaced in 
future versions. 

the availability of se; t t ibraris for each of 
the combinations allows one to analyze the per- 
formance: of separate framework combinations and 
a duect comparison with results nbnened bom a 

' n i 

may require the blocking of a binding site on a 
receptor by the antibody, while a different epitope 
on the receptor may be completely immune-domi- 
nant, in this erase, the preferentially selected but 
• i ework eombh m cart imp 1 , be 

left out Vlte eei> separate afhntv enri 
merits with subsets of frameworks can be carried 
out to ento ct n« rse epitopes, h 

i further a tie perform mce of 

this and other libraries may show that particular 
framework combinations contribute little to the 
poo! of selected tenders, white others need to be 
i ti m re initial d e ! v u uDRl and 
CDR2, The number of frameworks is, of course, 
arbitrary and can be adjusted by addition of new 
and subtraction of unnecessary ones. 



Expression and folding properties 

The HuCAL 

usage Whale t-\ oe>r expression 

behavior from most of the synthetic genes, this 
probably reflect ; favombli i >t< i > foldi »g proper- 
ties (see below), although the avoidance of eodons 
used only ven rai j at prerequisite for 

high expression yields. The consensus frameworks 
described here may be an interesting basis for elu- 
cidating the framework contributions to differences 
in folding yield during recombinant antibody 
expression and to thermodynamic stability. The 
ih n t i>] h n did i i t t : i > 1 1 h i i 
between die consensus frameworks may improve 
library quality, since (he probability of clones 
being eliminated during library selections due to 
very diifen nt effects of i ntmj antibody sequences 
on the bacterial cell physiology is minimized, 

in (his context, it is interesting to note that the 
high-expressing humanized antibody hn4D5, 
which was shown to be expressed 10-50-fold better 
in £. coii than the murine parental antibody (Carter 
et at., 1992a), was designed using human consensus 
frameworks derived froi id 
Vxl (Carter el cti. 1992b). The human V H 3 germline 
gene 3-23 tDh id wh,a, no4 nonudi gous 
(99% identity) to thy llsCAL cortseosus amino 
acid sequence of the V K 3 germ line subfam ily, is 
also the most frequently used V H 3 germline gene 
(see Table 1) and it is very frequently found in 
antibody phage-dispiay libraries based on human 
genes (Griffiths et ai, 1994; Vaughan et ai, 1996; 
Dorsam et ai, 1997; Boot et a!,.. 1998; Sheets el ei, 
1998), Our theoretical analysis (unpublished 
results) showed that this framework has very few 
of the recognized sequence problems. Such pro- 
blem spots im i ' > . 1 nbr i hi it k residues 
that might promote m&folding and aggregation 
(Nteba et at., 1997), non-Giy residues in positions 
with conserved positive phi angles, proline in pos- 
ition H40 (Knappik & PKtekthun, 1995) and the 
disruption of the highly conserved charge cluster 
around R^/D,^ and R. ; ./).\ ; . ; (p«>ba cf s.b 

Et is reasonable therefore, to hypothesize that 
consensus sequences, winch arc: closely related to 
phyiogenehcally old progenitor genes, are better 
adapted to folding in an environment like Ihe 
£ eoli periplasm, where probably most of the fold- 
ing catalysts and chaperones, which normally act 
on the folding pathway in the SR lumen of the 
antibody producing B-cell, are absent. It is tempt- 
ing to speculate that a consensus sequence defines 
i p t in ice ff an hi x the ob; >nel 

sequences have diverged through genetic drift 
until the function of t nger mam- 

correlation between degree ot deviation from the 
consensus seq I nd sf thermodynamic 

stability of a murine V, domain was found- Recent 
studies (Worn & Pluck thun, 199V) have shown 
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that, taking all available information into account, 
very stable md well-* tibodies can be 

red ' , .tin < ! t the p~sandwkh frame- 
work is, in principle, a highly stable scaffold, Yet 
most antibt it e < ed In from this pod * 
first in the course of gene duplication during evol- 
ution • 1 i *'^> i f ('C')j t t n 
where unfavorable CDR3s may be introduced, and 
finally m the .. . i , at i Iding antibody 

domains of very marginal biophysical integrity. 
The mouse repertoire is thought to bo significantly 
larger than the human repertoire (Aimagro et a/ v 
199b) and thus mote deviations from thy optimum 
are genetically encoded, partially explaining -the 
difficulties in expressing antibody fragments 
derived from murine hybridomas. Several residues 
experimentally shown to be non-optima! (Spada 
et nl, 1998; Proba el al, 1998} hove been found to 
be encoded in sonte of the mouse germline genes, 
but in none of the human genes. Such residues are 
totally avoided in the present design. 

Affinity maturation 

The synthetic HuCAL genes were designed to 
contain unique restriction sites flanking the regions 

u df's, '<l, S ( r I've "fit 

ben. The resulting modular gene structure, in 
combination with pre-built CDR library cassettes, 
will allow the rapid randomization of each CDR 
bop. We have constructed trinu I > - > 
LCDR1 and HCDR2 cassette rising a design pro- 
cedure identical with that desct tbed' here for CDR3 
cassettes (unpublished results). Hence an iterative 
randomization procedure can be envisaged, where 
the poo: of bindn it er initial 

library selections can serve as starting material for 
the next iteration. Such a protocol would mimic 
tv proces c »ftm»t> maturation b >matic 

achieving this ^wouid be different H^m 
reasoned' that this will be more efficient, as more of 
the mutations will be targeted to the region of 
interest. So far, the CDR walking process ha;; been 
1 ninii e protocols and 

libraries had to be established for each individual 
antibody sequence. By using cassettes and the con- 
served restriction sites of the synthetic genes, how- 
ever, an optimization of pools is possible, and the 
procedure is much more convenient. It has been 
shown now by several groups that the process of 
CDR walking, i:e. the iterative randomization of 
CDRs followed by stringent selection protocols, 

il i v. 4 ' a 1 tit t < u t milt ^ 

K K 0 i i, I ' V c 

N bub hi r b>-» KomjIc 

i * s t i i ti ti i v r t i 

picomoiat range could be obtained by this 
approach. 

Nevertheless, framework residues can have 
indirect effects on binding by affecting the CDK 



conformations fFoote & Winter, 1992; Saul & 
Poijaic, 1993), and a complete refinement may have 
to include these regions as well, e.g. by gene shuf- 
fling (Patten ei al, 1997) or ribosome display 
(Hanes a ai, 1 h B) lecentt> tl « latter approach 
has been applied to the BuCAU library, and 
binders with sub-nartomolar affinities to several 
antigens have been obtained that do carry further 
mutations introduced by PCR fuirpubiished 
results). 



Trinucleotide mixtures for CDR libraries 

Using the 49 combined HuCAL frameworks, the 
initial libraries were created by randomizing two 
of the six CD1? n i tng trin Seoticie budding 
blocks. Sortdek & Shortie (1992) first reported the 
use of a mixture of two trinucleotide pbosphorarm- 
dires, but round a coupling yield of only 4% and 
large differences of relative coupling ratios, 
Virnekas el a!. (1994) showed that coupling of tri- 
nucleotide mixntres can be achieved with coupling 
yields as high as 96-98.5%, by carefully excluding 
traces of water during preparation of the phos- 
phoramidite mixtures for coupling. However, in a 
first experiment using eight different trinucleotides, 
the indh di < 1 

end frequencies (between one and 15 times within 
63 positions being sequenced). hk> further improve- 
ment has been reported by other groups using 
similar building blocks (Lyttle et ai„ 1995; Ono 
el si, 1993; Kayushin et el, I199&), Braunngei & 
Little (1997) used the trinucleotides described by 
Kayushin el »/. 0996) m thdr approach to create a 
single-framework antibody library. However, no 
sequencing results were given to show the quality 
of the starting library or the distribution of indivki- 
ual codons. 

We found that mixtures of trinucleotide phos- 
phoramidiles can be coupled in excellent yields. 
Oligonucleotides with a length of more than 100 
bases and containing ten to 15 randomized pos- 
itions have been successfully synthesized. Further- 
more, no bias was found in most cases and 
trmudeotide-direcied mutagenesis appears now to 
be the method of choice to achieve full control over 
the variability. 

The option of using sub-stoichiometrtc coup- 
ling steps during oligonucleotide synthesis opens 
up a' novel way of creating diversity by- 
sequence and by length variation in a single oli- 
gonucleotide, We used sub- stoichiometric coup- 
ling for the generation of V\ and V H CDR3 
libraries, and indeed is was possible to create 
CDR3s of different length with this method. 
However, the distribution" of different length var- 
iants was in all cases shifted to shorter" library 
members that calculated suggesting tit s tin 
stepwise coupling yields calculated fio'm measur- 
ing the concentration of trityl cations, cleaved off 
the 5' -end, is higher than' the actual coupling 
yield, i.e. the percentage of oligonucleotide 
chains being elongated during the sub-stoichio- 
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metric step. However, she parameters influencing 
the outcome of the sub-stoidnomeirie approach 
have .oof been studied in detail. 

We decided initially to start with a diversifica- 
tion of CDR-I.3 and 'CDR-H3, to imitate natural 
- > (t } generation During the natural process of 
initial antibody generation, which results from gen- 
ome rearrangements in the developing B-eeli, most 
of the initial diversification is located in the V K 
CDR3 re; on (VDJ 3 od to i lesser extet t, 
thi , i i ' ( , ) h tl P < u mr s of 
antibodies, both CDRds form the so-called inner 
ring of the ant ; i n b l< site and :r of the 
antigen contacts are formed by residues located 
there (see Figures 1 and 3). 

Comparison to semi-synthetic 
antibody libraries 

The use of defined frameworks as the basis for 
generating an antibody library has been described 
bet, :e fnlti I ' rk on randomizing just CDE-H3 
(Barbas ei a/, 3992) has smce then been extended 
10 Vk CDR3 (Barbas at «L 1993, Yang et a/., 5995; 
Soderlind el a/.. 1995) or to single frameworks with 
ail CDRs being randomized (Mayas Id ct if/-. 1994: 
lha & Kurosawa, 1997). Furthermore, sets of V H 

earned with a 
single V t gene (Aussim et ul.. 1994), or a limited set 
of V t genes (De Kruif « a/., 1995), or a randomized 
repertoire of V ( , genes (Griffiths d ul, 1994). 
Most of the semi-synthetic human antibody 

I 1 t J ! i do 1 ' 

randomizing CDR3&. For V H , in most approaches 
A mv R im and D K10l were* kept constant, and pos- 
itions H93 to H'iOOic and usually Hi!)? as wail, 
were randomized (Hoogenboom & Winter, 1992: 
Barbas, e; a/ .. 1993. Do Krmf ct «/.. 1999). The 

length of the Cd>F3s varied between six and 20 
residues, with a preference for loops with six to '14 
amino acid residues. De Kruif el el. (1995) cdiv 
strueted a set of eight CDRs between eight and 17 
residues long, comprising completely and semi- 
randomized stretches. 

For Vk CDK3, usually residues 1,92 to 1,96 
were randomized (Barbas et aL 1993, 1994; Yang 
et a/,, 1995; Soderlind ct a/,. 1995). The length of 
the CO'Ros varied between sever, and ten test-" 
dues. Similarly, Hayashi el a/. (1994) randomized 

II residues (inchi 1 1 «e L97 < i framewori 
a) of VX CJ9R3 in their approach to construct a 

in. work lihi-aty iotl ii v e )Hs Iv , 
randomized, in contrast, Griffiths et a?. (1994) 
used a whole set of 21 V?, (as wed as Vk) germ- 
line genes and added, via i-CR, specific "CDR 

, >< * pri i , ?ter > to hvt t n d mi < d 
codons. in ail cases, codons were randomized by 
using mixtures of mononuc : lu ng oligo- 
n ucteotid e s ynthes is. 

In our CDK3 design, we had to decide whether 
to str\ close k th< > ' iri h, with a prefer- 
1 I v ected anti- 



bodies or whether to follow a more daring 
approach While, techmcalh t , 
equally feasible, as it would depend only on the 
types of cassettes used, we opted to first. "examine 
< 1 - j . e to tht > oded variety Even 
in a ioop of this size, many combinations will be 
non-functional, and we wanted to secure a very 
high number of initial functional molecules. As 
library selection technology progresses, e.g. by the 
use of methods sucl , (Hanes & 

Phi kthun 1997 Han« el a! I 8} ouch larger 
libraries will be screenabie, and a larger set of 
variants may be simultaneously present, including 
those with structural defects. 

When using the kno vn re t urged seques -> as 
a guide/it becomes an important question to what 
degree they represent "frozen accidents", explain- 
f i >! h ir 1 olutionar) sn> < try boR at 'I « 
germline and somatic level, or whether they are 
t ul positively selected or are even due to genetic 
hotspots, encoded into the DMA sequence. The 
processes underlying somatic hypeiTn citation are 
still not well understood. It was shown that heter- 
ologous genes replacing V gene segments, undergo 
hypermutation m vivo as 'well (Yelamos c> a/, 
1995), ar.d therefore it seems vers unlikeiv that the 
V genes themselves determine at the genetic ievd 
where hypermuiahon occurs. A more reasonable 
explanation would be that selection determines 
which mutations finally survive. Various efforts 
have ad<fcessed this question (see, for example, 
Dorner el al„ 199S), 

Weighing all arguments, we decided to take the 
natural distribution as our starting point. The mod- 
ular approach permits any desired optimization 
strategy to be readily be carried out, once primary 
binders have been obtained, such as the introduc- 
tion of V, CDRl and/or V H CDRJ n-sett.-s int.. 
single binders, or even pools of binders, since the 
sequences share identical restriction endoriuelease 
sites adjacent to the CDRs. It would be also ensile 
possible, for example, to keep the CDR3s of the 
selected pool of primary binders constant and 
shuffle V, , frameworks with randomized CQRzs. 
Alternatively, new sets of CDR3 libraries can be 
designed based on sequence motifs identified in 
the pool of primary binders Furthermore,, chain 
shuffling or even shuffling of elements such as 
CDRs or framework:; can now performed by 
restriction digest and religaho.n. 

Since HuCAl. is fully synthetic, it is always poss- 
ible to control the individual steos by anaRzing 
the restriction pattern oi individual ci ,i,< ; or by 
sequencing, with artifacts heme easily identified, 
vhereas an » one reoet r < ! i ( J j 
more or less a black box.' 

By th«s< mea ^> ! quence space of 

human antibodies will be much faster and more 
efficient than by using the conventional 
approaches. Finally, we expect that the careful 
analysis of selected' sequences will contain a wealth 
of structural information heat can flow into sub- 
sequent versions of the library. 



S ' unm nbinaiorii body Uorai 



Conclusions and perspective 

The HuCAL concept is based oi\ covering the 
essentia! features of the human antibody repertoire 
with a minimal number of different sequences, 
which ate d« Kite extensive manipu 

hdioo w it! tm i d pi M u n int in; t < i 
niquva. The 49 combinations of master series have 
been cloned as scFv genes in both orientations and 

• > • Hher f rn, , i rag men is, st t i{ 
ized for example by bisulfide-bridge;; fGiockshubor 
t tinkm ton et nl 1995 Kodri; ties I A 
1995} or fragments without any disulfide bends 
(Worn & Phkkihnn, 1998) useful for intrabody 

pproad f neo & ft c t 1 ' u Worn ei a{., 

2000) arc- easily adaptable and can be analyzed on 
the level of the master genes, before actual library 
generation, Libraries can be rapidly created by 
inserting pre-buUt CDR cassettes into each of the 49 
genes either separately or as mixed sequence pooi, 
and the analysis of binding variants is facilitated by 
the fact that only small regions in the sequence are 
varied and that the three-dimensional models of ail 
masto Inn«> ..its hiv, }> en bum i. etu 1>h a 
fore be possible for the first time to investigate 
experimentally why nature has evolved the distinct 
structural motifs found in the human antibody 
repertoire, and whether there are correlations of 
antibody Structure with antigen class, antibody a rm 
mty and specificity. -Future versions of HuCAL may 
therefore be enriched with antigen-type specific fea- 
tures. 



Materials and Methods 

Bacteria! strains, phages, 

Molecular cloning was carried out using the £. coli 
strains JM83 (Yarusch-Perron ciai,, 1985), XI.l-Blue (Stot- 
tagene) or Top'IO (Invitrogen), For expression exper- 
imertts JM83 t is used it e-dispi am vei 
1 ai pcoj. on » I iu, f ,i 

i K VCS( 1 m 

e.i j Hit oducls OVsts wi te 

cloned np?m - t SK{+) (Stra~ 

tagene) for sequ »BS victor series We foi 

, • 1 a „ < » 

w • dt pfa) \ to p \K1 Vt K tv • r / 



t: 



lending 



I „ -feme! - let i \ is t \hnd d m, 

inserting a cassette created by annealing the oligonudeo- 

tl 1 t r >! 1 O \J <li d ( i pit Hi , ' s , i 

iOrc-Ri, thereby in=.rc>duciiw the sbmt imoroved f i .AG t-w 
d)iM N f KtMpj.il, ^ N > ,i> ffo nw'tmg 

vector, designated pBSU, was later used for the assem- 
bly of sePv genes in the H-L orientation as well as for 
expression analysis. Second, the Xhd/£a>Rl fragment 
from pBSli v placed b tit created by 

annealing the oligunud< < tides b?stH and 03st)I, thereby 
introducing a s/fj signs! sequence containing a unique 
v 1 it > hi h f V\ g ut 

for the generati ent fht « ufo i ectot 

was designated pBSia The stl! gene fragment was 
extended bv insert t c d bv mneaiinc the 

oligonucleotides C6st!l F and CBsthP 'into pBSB wo 
the short "improved 
FLAG tag. The resulting vector, designated pBS14, was 
icier used r the assembly I sclw"""gei,e:> i the I 

the A'! i site m ti r»{ resistance row Km acne the 
phage display vector p!G10.3 is a derivative of p'EGiO 
(Gee) nl , 595 a) t i ( >ns ui the mature 

foil-length gene HI were deleted, linei'tv, the Ea-Ki/ 
Hindll! restriction fragment in the phagemid pJGIO was 
replaced by ihe e-cnyc tag for detection with the mono- 
ctonai anhbod) 9EW Mum k Pel mi 1 «i) Slowed 
by an amber cod on and the truncated version of the 
gent: 0 through I -d nmtaeot , , "The , ( si nin, , < 
the pMorph vector series, which is compatible with the 
1 1 a U ps, o s,u , ,,d vh t t ,d f , fonm 

ctoeing, will be described elsewhere (e t m M i 
results). Ail veciors were constructed rising site-directed 
mutagenesis (Kunkef. \mi recursive PCR (Prodfomou 
& Peed. IW2) and overiap-ex tension PCK (Ge & 
foidotph, 1997). and ail eonsh-urts were sebscoueniiv 
wiitast hv PNA m g . ,\, t ,< i 

Germany), 



CoHectton ot human antibody sequences 

Functional human gear-time sequences were down- 
h ' < ensoo ct til, 1997) from the 

Kabat database! and from Vbaset Kearwnged 
sequent wt c 1 t f m.l MW J j,o n , 

the Kabat da iowjiloaded 
variable domain ammo acid sequences extracted imd 
converted to the one- letter code. Sequences less than 90% 
compiete or . a i-u.ew uidOph .jodt-e t mint ti n-atees 
it. the- legions of mteoss; were eiiminated The automatic 
. , r ..t, m [,'naiul by the program Piteup i Wisconsin 

v >fodi , - Wi - , > j , ) t K V, 



03r 



tic antibody s 
ding a synthe 
ieating the 



tides 



:ides i 



anng 



work 3re given in Table 2 of the Sagplen 
KU a!) e as j» < tie tnto U , MffaRt o< 3 he 
resulting construct was designated pBS'll. This phcA 
gene fragT « t ir i site, wt va 

later used for insertion of V„ genes for the generatian of 



1 ftp-/ ,-ttwn bme.fi- 
t http.-AAvww ran- 



u.edu/pub/database 



where ail subs 
res took ptace 



in the single tetter code according 
nomenclature. Cemdine sequences a, 
to accepted focus nomenclature 

(Giudketli ?:l a/., 1997). 



Standard fUPAC 
named according 
>r each segment 



Molecular modeling 



Alter alignment and numb ring ... rding to Rabat 
the f it ) were normalized by checking for multiple 
entries, of closely related sequences, which we thought 
would nd -it m irtifit I s < specific set of 

rearranged sequences. Subsequently, the rearranged and 
ie germ! i ( »ped into the ariou 

. te to each 

rt mm ed « tit i I ,t ties of a greet 

r< irranged era ch i rmline equ >ree v sr. 

ed In m posttit n i to 9; iV.n ;t position I to 95 (Vk) 
or J to 95B (VX). If the result was ambiguous., e.g. the 

i i >' k r»t from two o> 

' i , t - j.jen it- if the f -sib t govt less than 80 
identity, indicating either a very high level of somatic 
. or the engirt hot) 



■msj. : 



using the Homology, fhopolv- 
es of the program insight!! ve- 
rt Diego, CA)' To align 'different 



rmlin 



range. 



By this analysis, the subfamilies that ere used ire., 
quenby by the human immune system were identified. 
Tiw databases of rearranged sequences were need to cal- 
culate a eon sen .sue; sequence for each frequently used 
Subfamily. This seas done by counting the number of 
i > t e d ; i 1 ; 1 - > < 

variability) and s i th < the amine acid 

residue most frequently used at each position. Tire con-' 
emu •»>5u< > i< <■ t 1 th me cons ms is 

of the germiine farmUes to see whether the rearranged 
sequences were biased at certain positions towards 
amino add residues that do not occur in the collected 
germ-inn sequences, hot this was found not to bo the 
rase. Subsequently, the CDK'i and CDR2 regions of the 
consensus seq nonces were replaced with the correspond- 
ing regions of the germiine sequences that were most fre- 
quently used by the human immune system. For the 
framework 4 region, the consensus of 'all rearranged 
sequences was chosen For each of these consensus 
sequences, the mast homologous rearranged sequences 
were then identified and used for validating the consen- 
sus by identifying all framework residues that differed 
between the consensus and the most homologous 
rearranged sequences. Those residues were regarded as 
tdifici md eohed by ttvowneam first, the ioeai eott 
text of the artificial residue was compared with the cor- 
responding stretch o( all the rearranged sequences in the 
database; and second, the iong-iange interactions of 



esidnes 



least-squares fit of the expositions of residues H3-H7, 
H19-H23, Hi 5 structural cri- 

teria, not according t habat H44.ll H67-H71/H78- 
H82, H88-H04 m I H if S-I.7, L20-J24, 

L33-L39, L43-L49, L62-L66, L71-L75, LS4-L90 and L97- 
L103 (VJ was performed. Trie experimental structures 
dispiaying the highest degree of sequence simiiajitv to 
the different RuCAL constructs are listed in Tabic 'i of 
the Supplementary Material Structural differences 
between these templates were analyzed to identify the 
sequence differences responsible tor the deviations The 
i fotni llion f ti) v on i , e 1 , , i tai n ' ill t 
structure of the humanized 4D5 version 8 (h'DB entry 
ir'VQ. Coordinates were assigned usbie, the Homology 
module and the resulting models checked tor stereo 
s!ss) e md otviti ninim tion in iduk 

Discover, CFF91 fore efield.s).' The stereochemical quality 
of the inch domain models was evaluated with the pto- 
•" Si< ft t r v h < Morn / 



Gene synthesis and assembly 

Consensus amino acid sequences were back-trans- 
lated into DMA sequences using the GCC software 
package (Genetics Computer Croup, Madison, Y¥% 
USA) and a Codon definition file that included only 
the codons that are used frequently in E. coh%. At! 
possible silent (and commercially available) restriction 
sites based on version 501 of the RhbASe list of 
restriction enzymes (Roberts &. Maceiis. )9s'9s) were 
subsequently identified in the resulting DNA 
sequences and tabulated. These tables were used to 
identify a!) cleavage sites that were located close to 
the lets, and that could be 

introduced into all genes of the three classes (V H . Vk 
or Vis) simultaneously at the same position Further 
editing was done as described in Results, l-or each of 
the 14 resulting genes, six overlapping oligonucleo- 
tides were designed. Since both the CXS&'S and the fra- 
mework 4 gene segments were identical in all Vic VX 
and V' M genes, respectively, this part was constructed 



c oo 



position Others t t oi tendon - a 

chosen and analyzed as described above. Finally, the 
consensus - - a red to the correspond- 

ing gen-aline sequences and me number of differences 
were tabulated. 



\ www. biochem. uei.ac.uk/ roman/procheck/ 
ucheck.html 

I ftp; / /ftp.ebi acaih/pub/dalabases/codoniisage/ 



(d ipie : 



lite 



Ti® ■ 



noes 



(Infor 

1992) w » s pert rmcxi by mi pmo af etch of 

the oligonucleotides in a 100 \il reaction volume con- 
taining"^ nroof of dMVs and five unsts of Pfu poly- 
merase (Stratagem;!, After a first cycle with three 
minutes at 94 "C, two minutes at 60 C C and one min- 
ute at 72 X using a hotsfart procedure, af PCS cycles 
were per ft mied i < mt< it 3 t two nin 'h t 
60 "C 'and one minute at 72 »C), She products were 
purified rnsing the QIAgen PCH purification kit and 
blunt-end iigaied with either the pCfi -Script KS{q-) 
(cut with 5jr\) ot the pZero-1 vector (cut with £wRV), 



insert containing clones were screened by Hue-white 
selection (pCK-Script KS(.j-}> or directly picked (pZero- 
11 and sequenced. 



signal 



ioi J rr owes wen svn 

tbesfecd with thttir authentic N tennBi and without the 
short FLAG sequew*, which was added inter during the 
construction o! scrv display c it The heavy chain 
Cj,l domain (subtype IgGl, Cenbank accession number 

14) iikhictir til V, tt urn i rk 4 r< »« i i 
i L < n :ns, eight i he nn b ti i (OC! i'i < w > ti- 
and inserted Into' pC.P-Script k'S(-i-). The C„l gene 
ie.signed i I F el; codi i usage 

' t ik r Sail and EtvRl 

incorporated at the 5'- and a'-ends, respectively, and 
most internal restriction sites were removed during the 
gene design, in a second step, the V fi dummy CDK3 
region wan nsetu .s i i !yl s tt* using the oli 

i >. J ' ■ I Oik Pi , 

' I lit . , I 

framework 4-C M t r s n.,'bmh.n by a shiee- 



the Vk gene fragments wen- PCR ampimed m 
pBS13b using t? E <P*v IDIP (wer- 

e x denotes the backward f 

mer OLFw4 M, and the PCR products were bhmt-end 
iigated nk ,1 ( i, \ t \ , < ap,-,M „ 
vector constructed as described below), which had 
been cut with EeoRV/BsiWl and made biunt-ended by 
teatraenl with S, nucfeasi fl- km tint ti i < pins 
if i si lamed pf>-' ! > i m , r, mid 1 ' 
; n, w . e die - N-nni ,i, ccfoons had been 
changed to the cretev recognition sequence encoding 
sspaTtate-tsoieucine, in order to allow the same scPte 
ionin { ! it i i ] i j it n t * L in < 'a 
mids were used for assembly of the V k -V H scFv 
expression vectors (see below- !n orciei n >ss<: bU 
V t( -V, scPv <, cisset tructed b\ anneal- 

ing the oitgonu f iteto and (XEco3 was 
inserted into the tai ig i ntatnine ve tors 

p-bS v/„ ,C cut with M d.< EcoRJ th< rc ' 
eing the CI. constant domain g« 



s fragrr 



: plasmids r 



thereby adding 



r,d £ 



r pbb 



The four synthesized V t kappa genes covered the 
c ' reslrictioj ■-.!< ' 
th till j) t uq 

! tehxra) i t Mnewock region punt 

to the CDR3 (£co5?l). The human kappa constant 
domain Or {Cenbank accession number P0I854,' 
the \\ dummy 
Rework 3 region 



t v i ' r l + 



. iOCLhl 



mtha 
OCLhh) 



r.t the 5' and te-er.d, respectively "flic Vk gene frag- 
ments (Nsil/EcoW, were item assembled "with the 
CDR3»firamework 4-Ck sequence {Sphl/ EcoSTl) by a 
thrcfr-ircgment ligation with the vector pBSB (%M7 
.%!)}, yielding four kappa tight chain fragments for 
construction of F.,,, expression vectors. 

The three synthesized V v iarnbda genes covered the 
sequences to c i , t res m sit, located ir 
m< (<V«I) fo the unique 3' 

restriction cite located in the framework 3 region prior 
to the CPR3 (Bbsl) All genes were sv th« i ' 
their authentic N tennint, i.e. without the aspartate' 
isoieucine stretch etKoded by an EeoKV site used for 
she Vx genes. The human ' lambda constant- domain 
O.l (Cehbank accession number P01842) iiwiudifig the 
VX framework 4 region, ihe V L dummy CDR3," and 
part of the VA. framework 3 region was 'assembled as 
>■• 1 - 1 !m> i r i ' r n ( t i pent sm nr. r,i itl 
t t hgnnuvkcti ft ^ L n ni c it' i N i-fc » U 
were assembled with tho CDR3-framework 4-CX 

i e (tol 1) ,),>-, e it non whin 
*e vecto» pBS , fcee tamsda 

light chain fragments for construction of 
expression vectors. In order to assemble V t -V H sepv 
vector hu- V". jjene fragments we further "modified-. 



Ihj, expression plasmids wen? constructed by combin- 
ing each of the heavy chain Fd fragments cut with %«/ 
£coRl and each of the light chain fragments cut with 
i 1 F1S1 3 vector cut ith -rot V, , 

n reaction The 49 resulting pias- 
mids were verified by restriction enxyme digestions. 
Here, the VX gene fragments contain their authentic N 
termini, and there is no FLAG teg sequence attactied to 
the antibody F,. $ , geiies. 

The scFv expression pbsmkts in the orientation Vp-V,., 
were constructed as fo»o»vsr the Ck gene fragment from 
PBso VkSCk: eras removed by cutting the plasmid with 
!U!W/$l>ttl and replaced by i>>\ otigonecieondc ca-settc- 
..-nr-odiog a 20 ammo ac ; d lesicinc- tinker plus the 
additional restriction sites Al/d and EcoRl for later inser- 
tion Of the V H genes. The cassette was constnieted bv 
annealing the oiigomjeieotides OLBLiP and OI.HLiM. 
Subscqiiendy, the remaining V\ and VP genes were 
inserted as XhsI/SsiWJ fragments and the" V H aciie.s 
•were inserted as Mfe] / EcoKl fragments. 

ITie 49 scFv expression plasmids in the orientatian 
h-V ( . were con 1 the t. I'l get t i g 

ment hom p0S12_VH3CHl was removed by cutting the 
pfastred with f J, 1 f M and nih L 1 b n i , u 
1 ' t' b - ■ tie ■ . d rig . 20 ;j ii.nc acid'rt id'.n inrl ■ r 
plus the additio) fcti h Is for latei inser- 

tion of the V gene.- Th? eassc-m 
annealing tht uid OHLLiM 

■t ,e Vx urn \ ■ get .ur mse ed s 

( itw si ! - - . , n , . tf t d 

as Xbm/Bipl fragments. These 49 vectors went used for 
expression analysis, and tee scT-te genes were later used 
for library cons true iron. 



(Expression analysis 



Growth curves and expression data were obtained 
es&entii !y as c ppik & Pia kthun i t(> ?) 

Brief]) ccij IMhb cultur«s omteining the appro riete 
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scFv expression vectors were grown at 30 "C and 
induced with 1 roM 1FTC Af 1 ui s of expression, 

< ;)1 wen hiiu it or 
lysed and separated into soluble and insoluble cell frac- 
tions by a > . fractions were assayed by 
i! \ tain ng using 
it- srtti-FLA the amount of 
and , a was quanti- 
fied detisirometrfcalfy. The scFv gene H3*2 was used as 
internal control in each - - - on, periment 

Expression kinetics were measured as follows: 
£. co!i JM63 ceils were transformed with the scFv 
genes cloned in the expression vector pMorphx7_FS 

{ur.pubb >h i r. , s n m 1 1 s! d t v ffisk 

cultures af "O'C After induction with 1 j«M 1FTG. 
SO mi of culture was harvested each hour, thy ceils 
were .normalized to ,4 = 50, fysed by sonificatton, and 
the crude extract"; were stored at -20 *C. After ten 
hours induction, the remaining culture i.SOfj mi) was 
tt'i >> kd I td 1 ^ * 1 ■ i f i O.d n in, 

i> liferent t nis > li de'en 

s-jJSA measurements, where the purified antibody 
fragment of known ewocentraiion served as interna- 
standard user! to calculate the scFv amount based on 
the ELJSA sign;,! obtained. 

COR analysis and library design 

s ' irranged human antibody 

V H And V, sequences wa 1 for anal 
length and composition. For analysis of V H CDR3 a, all 
sequences were grouped together because sequence 

i i i i » hl\ 

hor V» ind V>. ( PfCis th. subf »K 
separate!) Within the th« CDRs 

were grouped according to CDS length Assignment, of 
the indi-cidiwl croups \o canonical structures was done 
according is tfu 1 
All wwlysi 



Synthesis of trinudeottde-contsming 
oligonucleotides 

Synthesis of Q-melhyl trinucleotide phosphoramidites 
md thur \y phc-tt < n in iutomafi i 
been described (Virnekas el ui, 199-1). Trinucleotide mix- 
tures were prepared by mixing appropriate stoiehio- 
ini tn -i («i nt i 1 i nt< , , in nil 

were dried under argon and dissolved to yield 0.1 M sol- 

-.sl'-v. was f m i - on an Applied Biosysfems 
DNA 

obtained front K\ la Biosysl n m MWC Ebe< 
berg Gfrnwnj M trine oii ynthes were 

performed on columns with polystyrene support, 
1000 A, 40 mol t . i > n art 401072 to 
i010 5), For syt i r* s th mononucleotide 
building blocks ol the «, conventonai 

mononucieo id . ■> .udites, and 

the stendar i syn t > h ogle coupling, 15 

seconds wait step) were used. When coupling trinucleo- 
tide mixtures st i ndard cycle was 
changed to d i i > seconds wait 
step aftei lot ens t step after the 



second coupling. For suh-stoichiomefric couplings, the 
time for delivering activated phosohoramidite: solution 
to the column was reduced to achieve approximately 
50% coupling yield, H substoichiometrk- coupling rate's 
wen? much higher or lower than 50%, either the time 
was adjusted foi the sub nj. to obt un an 

average yield of 50% over alt substoichiometric coup- 
lings oi an t pling step 
was ended Deprotection it the oligom ieotids a 
peri rn i as de a All trinu- 
deotide-ront,' , - cd for CDRA 
library generation are given in 'fable 3 of the Supplemen- 
tary Material. 



Cassette preparation 

n i i icie i * i i s 1 1 i L i r t l i w 
sod pi ified w th m S2«K column Ph. m ie, ...no 
irig to the supplier's manual The complementary 
strand was synthesized with Ktenow polymerase 

i j i it picwin\-nely 5 nmol of 

gonocleotide we;; mixed with a rasxette-spcoik corre- 
sponding primer at a ratio of 1:1.2, respectively, 
healed for ten minutes, to 80 'C followed by slowly 
cooling to room temperature: 10 ui of a 10 mM diwTP 
mixture, 15 pi ot Klenonv buffer. '.: el of Klenmv poly- 
mernse and water to 150 t tl flnai voion,e were added. 
The fill-in reaction was perfornied at 37 ' J C for two 
hours end purified with a Nick Spin column accord- 
ing to the supplier's manual (Pharmacia Biotech). "Ihe 
till-in reaction was checked by an analytical PMC 
i tu i,!l in products, 

i>CR reactions were performed using 1 jji of the fill-in 
reaction mixtures (approximately 25 pmo?) end 
mol irn i f in primer plus second 

' i r; m each case (.'!0 wrh s one 

minute at 04 *C; one minute at 54 "C, one minute at 
72 *C). The PCR mixtures were purified with a Nick 
Spin column. The oligonucleotide library cassettes 
"were prepared for ligation by adding 30 td Ot* buffer 
to 100 ul of the purifk-d PCR product. 150 units of 
each and 
water to a final volume of 30!) ui, and by digesting 
overnight at 37 "C, The cassettes were purified on 4% 
RvSC agarose gels ■Ihomoi). and recovered from the 
gel via BiOTRAP eitnion (Schleicher & Schuell, 
Germany) according to the supplier's manual (approxi- 
mate!}' two hours at 100 V/ 50-70 in A). The solutions 
containing the cassettes were desalted with Nick spin 
columns The qualit) ot the «« iettes was chtxked bv 
analytical PMC -< r t gels t > 



Generation o! the HuCAU library 

Template V., vectors were created by inserting She 
seven fu( I , rn stet gem Fd r fr j, the 
vectot pbM J into the iisj j t tph (unpub- 

lished results). The V H CDlle StiqueJicos wwe then 
replaced by a 1220 bp dummy fragment coitfaining thai 
f i t i i c , ( bsci|uent steps 

for vector fragment p: yparation. The template V,, vectors 
were cut with Sfyl/Hwdffl to reniove the CHI gene 
fragment, and the vector fragments were purified. At 
this step, file rw ' n >!al ector fragments encoding 
the; V M JA and V H :lB master genes were mixed in an 
w.ju . - u t id c r » bit t n • - v' : . ( vi ctoj templates 

Template V L vectors were constructed by firs! 
inserting the HuCAL scFv master genes containing the 
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dua-rw ,*a wj n as described above. The ic • 
seven template V,. vectors were then purified, the 
1 11 m was rem wed b , ittlm 

with fifxf ' ) ki S tne oat unif |, i. in in! 
f«*;}/7fo;I for {he VX genc-contaimng vectors. Tite pre- 
pared trinucleotide rosettes encoding the V, CDR3 
bonnes wer I reiy with th- e en 

V, tempiah vi n s (25 tmol of each vcctot 

was Ugated with 250 few! of each cassette for three 
hours at room temperature}, and the ligation mixtures 
were electroporated in 0.9 nil of £ "coh TCI cells, 
yielding altogether l.H x 10"' independent colonies. 
Phs colonic i t , c ft I cifon plates, and 

the /,. CDt ibt < or. tored in JO f w /v) gly- 
cerol at -SO : 'C Fhagenbd ONA of the four Vk knd 
the three VJ. iibtaries was prepared and two pools 
were created by mixing the four Vte (k^) and the 
three VX (X t ,,w libraries m an ermtmoiar ' ratio. The 
two DNA pools were treated with Sfyh'H.wdlll, the 
V L gene libraries were purified using agarose gel elec- 
trophoresis and 75 fmol of each pool was ligated with 
25 hoot of each of the six V H template vectors (see 
above), and electroporated in 0.3 ml of £. col; TCI 
cells,, resulting in altogether 2.3 x If? colonies for the 
12 library pools in the line! step, these 12 libraries 
were prepared as DNA, cut with SssHH/Sfyl to 
removf the {J-lactamase dummy gene inside the V„ 
CDK3 region, and the two V H CDR3 trinucleotide 
library cassettes <HCDR3a and HCDRSb) were 
inserted separately by ligation using the same Con- 
fer elect. osj. ratio i in i ml i 



?■:. i 



TO 



iopeodan 



colon! 



obtained aStogetl 

: i nd » io y~L i b< " > ,! ' l \ 

scrape! off the selection [dates, and th.r 24 HuCAL'1 
library were stored as aiiquots in 20% jslymoi at 

~m *c. 

Data Bank accession numbers 

The coordinates of the 14 framework models have 
been deposited in the RCSB Ptotein Data Bank, entries 

> t ) i 

1DH7 (VU'i, 1DM8 (VK?), 1DH9 (VX3), 1DHA (VHIA). 
1DHO fVHIB), 1DHQ (VH2), 1DHU <VH3), 1DHV 
£VH4) iDi-TW tl i ) 
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